AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Roy, Somjit, Jaiswal, Prateek, Bhattacharya, Anirban, Pati, Debdeep, Mallick, Bani K.

Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors

arXiv.org Machine LearningFeb-17-2026

We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2602.14472

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Neural Information Processing SystemsFeb-8-2026, 06:27:24 GMT

39799c18791e8d7eb29704fc5bc04ac8-Supplemental.pdf

exp null 1 2, matrix, positive rate, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Neural Information Processing SystemsNov-20-2025, 03:41:41 GMT

Discretely Beyond 1 /e: Guided Combinatorial Algorithms for Submodular Maximization

These are achieved by guiding the randomized greedy algorithm with a fast local search algorithm. Further, we develop deterministic versions of these algorithms, maintaining the same ratio and asymptotic time complexity.

algorithm, artificial intelligence, constraint, (14 more...)

Country: North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Neural Information Processing SystemsOct-10-2025, 16:04:15 GMT

Discretely Beyond 1 /e: Guided Combinatorial Algorithms for Submodular Maximization

algorithm, constraint, matroid constraint, (13 more...)

Country: North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Wortsman, Arie, Loureiro, Bruno

Kernel ridge regression under power-law data: spectrum and generalization

arXiv.org Machine LearningOct-7-2025

In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where power-law assumptions are typically imposed on the kernel eigen-spectrum itself. Our contributions are twofold. First, we derive an explicit characterization of the kernel spectrum for polynomial inner-product kernels, giving a precise description of how the kernel eigen-spectrum inherits the data decay. Second, we provide an asymptotic analysis of the excess risk in the high-dimensional regime for a particular kernel with this spectral behavior, showing that the sample complexity is governed by the effective dimension of the data rather than the ambient dimension. These results establish a fundamental advantage of learning with power-law anisotropic data over isotropic data. To our knowledge, this is the first rigorous treatment of non-linear KRR under power-law data.

eigenvalue, kernel, spectrum, (17 more...)

2510.0478

Country:

Europe > France (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

arXiv.org Machine LearningSep-16-2025

Some Robustness Properties of Label Cleaning

Cheng, Chen, Duchi, John

We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the context of risk consistency -- when one takes the standard approach in machine learning of minimizing a surrogate (typically convex) loss in place of a desired task loss (such as the zero-one mis-classification error) -- procedures using label aggregation obtain stronger consistency guarantees than those even possible using raw labels. And while classical statistical scenarios of fitting perfectly-specified models suggest that incorporating all possible information -- modeling uncertainty in labels -- is statistically efficient, consistency fails for ``standard'' approaches as soon as a loss to be minimized is even slightly mis-specified. Yet procedures leveraging aggregated information still converge to optimal classifiers, highlighting how incorporating a fuller view of the data analysis pipeline, from collection to model-fitting to prediction time, can yield a more robust methodology by refining noisy signals.

aggregation, consistency, inequality, (16 more...)

2509.11379

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Neural Information Processing SystemsAug-15-2025, 09:15:41 GMT

Supplementary Material A Proof of Paper Results

We now consider the gradient of the log-variance loss. Using the definition from Eq. 5, we see that From [Reiss, 2012, Lemma A.3.5] we have the bound Combining these estimates we arrive at the claimed result. The claim follows by direct calculation. In fact, it is possible to take K = 15 . Lemma 3 shows that the kurtosis term in our bound Eq. 16 can be bounded for Gaussian families.

argrad, estimator, variance, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Noskov, Fedor, Puchkin, Nikita, Spokoiny, Vladimir

Dimension-free bounds in high-dimensional linear regression via error-in-operator approach

arXiv.org Machine LearningFeb-21-2025

In contrast to the standard intuition that a learner should search for a trade-off between approximation and estimation errors, researchers empirically observed that large interpolating rules may still have small test error. Moreover, when the number of parameters exceeds sample size the prediction risk of neural networks passes a U-shaped curve and decreases again [Zhang et al., 2017, Nakkiran et al., 2020]. A bit later it became clear that benign overfitting and double descent are not distinctive features of deep learning. Similar phenomena are ubiquitous for overparametrized models such as random forests and random feature models [Belkin et al., 2019, Mei and Montanari, 2022], kernel methods [Belkin et al., 2018, Liang and Rakhlin, 2020], and linear regression [Bartlett et al., 2020, Hastie et al., 2022] to name a few. In [Belkin et al., 2018], the authors reasonably suggested that we must study more tractable "shallow" methods better before diving into deep learning theory. In the present paper, we consider a classical linear regression problem, where a learner aims to estimate an unknown vector θ R d from i.i.d.

dnull 1 2, inequality, nullnull 1 2, (13 more...)

2502.15437

Country:

North America > United States (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Yu, Zhan, Ho, Daniel W. C.

Estimates on Learning Rates for Multi-Penalty Distribution Regression

arXiv.org Machine LearningJun-16-2020

This paper is concerned with functional learning by utilizing two-stage sampled distribution regression. We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory. The algorithm aims at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging, since only second stage samples are observable in practical setting. In the algorithm, to transform information from samples, we embed the distributions to a reproducing kernel Hilbert space $\mathcal{H}_K$ associated with Mercer kernel $K$ via mean embedding technique. The main contribution of the paper is to present a novel multi-penalty regularization algorithm to capture more features of distribution regression and derive optimal learning rates for the algorithm. The work also derives learning rates for distribution regression in the nonstandard setting $f_{\rho}\notin\mathcal{H}_K$, which is not explored in existing literature. Moreover, we propose a distribution regression-based distributed learning algorithm to face large-scale data or information challenge. The optimal learning rates are derived for the distributed learning algorithm. By providing new algorithms and showing their learning rates, we improve the existing work in different aspects in the literature.

artificial intelligence, distribution regression, machine learning, (19 more...)

2006.09017

Country: Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.75)